Best MoE Model AI Tools & Models - Premium MoE Model News

AI News

Four Mac Studios Overcome Cloud Clusters! Apple Teams Up with LM Studio to Run Trillion-Parameter Large Models Locally

At WWDC2026, LM Studio and Apple demoed Kimi K2.6, a trillion-parameter MoE model from Moonshot, on just four Mac Studios. With 1T total params, 32B active, it supports long context, multimodal and agent tasks, breaking cloud GPU monopoly and enabling frontier AI on consumer hardware.....

11.7k 9 minutes ago

Locally Run Trillion-Parameter Models: Apple and LM Studio Team Up to Unlock the Full Potential of Mac Studio

At WWDC 2026, LM Studio and Apple demoed Moonshot AI's trillion-parameter MoE model Kimi K2.6 running smoothly on a cluster of four Mac Studios. This challenges the assumption that large models require the cloud, proving consumer hardware can handle cutting-edge AI and marking a milestone for local deployment.....

11.1k 5 hours ago

Tsinghua University and Tencent Hunyuan Win the MLSys2026 MoE Inference Challenge with a 4.1x Speedup on NPU

The Tsinghua University Storage Lab and the Tencent Hunyuan AI Infra team won the global championship in the MLSys2026 MoE Model Inference Optimization Challenge. To address the inference bottlenecks of the trillion-parameter mixture-of-experts (MoE) architecture on heterogeneous NPUs, the joint team designed a full-chain optimization solution, including the E-Shard strategy, PSUM three-dimensional tensor batch reading, and GEMV path, significantly improving performance.

10.8k 7 hours ago

Tsinghua University and Tencent Hunyuan Win the MLSys2026 MoE Inference Challenge with a 4.1x Speedup on NPU

Xunfei Xinghuo X2-Flash Model Launch: Focusing on Domestic Computing Power, 256K Long Text Capability Upgrade

iFlytek released the Spark X2-Flash model, using MoE architecture with 30B total parameters, supporting 256K ultra-long context, fully trained on Huawei Ascend 910B clusters, marking a new efficiency stage for large models in the domestic computing ecosystem.....

21.4k yesterday

AI Products

Wan2.2

The world's first open-source MoE video generation model, supporting text/image to 720P video conversion

MoE architecture

6.9k

Moonlight-16B-A3B

Moonlight-16B-A3B is a 16B parameter Mixture-of-Experts (MoE) model trained with the Muon optimizer for efficient language generation.

AI model

13k

Moonlight

Moonlight is a 16B parameter Mixture of Experts (MoE) model trained with the Muon optimizer, delivering exceptional performance.

AI model

9.6k

Qwen2.5-Max

Qwen2.5-Max is a large-scale Mixture-of-Expert (MoE) model designed to enhance model intelligence.

AI model

46.9k

Models

Gemini 2.0 Flash-Lite

Google

$0.49

Input tokens/M

$2.1

Output tokens/M

Context Length

GPT-4.1 mini

Openai

$2.8

Input tokens/M

$11.2

Output tokens/M

Context Length

Grok 4 Fast

Xai

$1.4

Input tokens/M

$3.5

Output tokens/M

Context Length

o3-mini

Openai

$7.7

Input tokens/M

$30.8

Output tokens/M

200

Context Length

GPT-5 Codex

Openai

Input tokens/M

Output tokens/M

Context Length

Claude 3 Opus

Anthropic

$105

Input tokens/M

$525

Output tokens/M

200

Context Length

Gemini 2.0 Flash

Google

$0.7

Input tokens/M

$2.8

Output tokens/M

Context Length

Claude Haiku 4.5

Anthropic

Input tokens/M

$35

Output tokens/M

200

Context Length

Gemini 2.5 Flash

Google

$2.1

Input tokens/M

$17.5

Output tokens/M

Context Length

Claude Sonnet 4.5

Anthropic

$21

Input tokens/M

$105

Output tokens/M

200

Context Length

Claude 3 Sonnet

Anthropic

$21

Input tokens/M

$105

Output tokens/M

200

Context Length

Gemini 2.5 Flash-Lite

Google

$0.7

Input tokens/M

$2.8

Output tokens/M

Context Length

qwen3-vl-235b-a22b-thinking

Alibaba

Input tokens/M

$20

Output tokens/M

Context Length

qwen3-coder-plus

Alibaba

Input tokens/M

$16

Output tokens/M

Context Length

wan2.5-i2i-preview

Alibaba

Input tokens/M

Output tokens/M

Context Length

Qianfan-Lightning

Baidu

Input tokens/M

Output tokens/M

128

Context Length

qwen3-max

Alibaba

Input tokens/M

$24

Output tokens/M

256

Context Length

qwen3-vl-plus

Alibaba

Input tokens/M

$10

Output tokens/M

256

Context Length

qwen-image-plus

Alibaba

Input tokens/M

Output tokens/M

Context Length

qwen-image-edit

Alibaba

Input tokens/M

Output tokens/M

Context Length

MCP

Mcp Stocks Info Server

The MOEX Stocks & News MCP Server is an interface service based on the Model Context Protocol, which provides functions for querying and analyzing stock quotes and financial news of the Moscow Exchange, and supports integration with large language models.

13k

2.0points

Empowering the future, your artificial intelligence solution think tank

English 简体中文繁體中文にほんご

FirendLinks:

AI Newsletters AI Tools MCP Servers AI News AI Marketing LLM Leaderboard AI Ranking

Business Cooperation Site Map

AI News

Four Mac Studios Overcome Cloud Clusters! Apple Teams Up with LM Studio to Run Trillion-Parameter Large Models Locally

Locally Run Trillion-Parameter Models: Apple and LM Studio Team Up to Unlock the Full Potential of Mac Studio

Tsinghua University and Tencent Hunyuan Win the MLSys2026 MoE Inference Challenge with a 4.1x Speedup on NPU

Xunfei Xinghuo X2-Flash Model Launch: Focusing on Domestic Computing Power, 256K Long Text Capability Upgrade

AI Products

Wan2.2

Moonlight-16B-A3B

Moonlight

Qwen2.5-Max

Models

Gemini 2.0 Flash-Lite

GPT-4.1 mini

Grok 4 Fast

o3-mini

GPT-5 Codex

Claude 3 Opus

Gemini 2.0 Flash

Claude Haiku 4.5

Gemini 2.5 Flash

Claude Sonnet 4.5

Claude 3 Sonnet

Gemini 2.5 Flash-Lite

qwen3-vl-235b-a22b-thinking

qwen3-coder-plus

wan2.5-i2i-preview

Qianfan-Lightning

qwen3-max

qwen3-vl-plus

qwen-image-plus

qwen-image-edit

INTELLECT 3 MXFP4_MOE GGUF

GigaChat3 10B A1.8B GGUF

INTELLECT 3 FP8

Wan2.2 I2V A14B Diffusers

GigaChat3 10B A1.8B Bf16

GigaChat3 10B A1.8B Base

Cerebras_MiniMax M2 REAP 139B A10B GGUF

Moondream3 Preview Hf

MiniMax M2 GGUF

Qwen3 VL 235B A22B Instruct GGUF

Qwen3 VL 30B A3B Instruct GGUF

Qwen3 VL 30B A3B Thinking GGUF

MiniMax M2 GGUF

MiniMax M2

Qwen3 Next 80B A3B Thinking GGUF

Ling 1T GGUF

Ming Flash Omni Preview

Deepseek Moe 16b Q4 K M Cpu Offload Gguf

Gpt Oss 120b Eagle3 V2

Gpt Oss 20b Moe Cpu Offload Gguf

MCP

Mcp Stocks Info Server